News Article Classification Based on a Vector Representation Including Words’ Collocations
نویسندگان
چکیده
In this paper we present a proposal including collocations into the pre-processing of the text mining, which we use for the fast news article recommendation and experiments based on real data from the biggest Slovak newspaper. The news article section can be predicted based on several article’s characteristics as article name, content, keywords etc. We provided experiments aimed at comparison of several approaches and algorithms including expressive vector representation, with considering most popular words collocations obtained from Slovak National Corpus.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملThematic analysis of the news of the 2020 Tokyo Olympics with emphasis on gender(case study: Shargh news paper)
abstract: The purpose of writing this article is to thematically analyze the news of the 2020 Tokyo Olympics by emphasizing gender and presenting an indigenous model of its related components using the theories of experts. The text of the Tokyo 2020 Olympic event is in Shargh 1400 newspaper (August 1 - August 17) which is a purposeful sampling, first based on commonalities, related them...
متن کامل